Search CORE

135 research outputs found

Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry

Author: An Jisun
Kwak Haewoon
Publication venue
Publication date: 07/03/2016
Field of study

In this work, we compare GDELT and Event Registry, which monitor news articles worldwide and provide big data to researchers regarding scale, news sources, and news geography. We found significant differences in scale and news sources, but surprisingly, we observed high similarity in news geography between the two datasets.Comment: To be appeared in ICWSM'1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

#greysanatomy vs. #yankees: Demographics and Hashtag Use on Twitter

Author: An Jisun
Weber Ingmar
Publication venue
Publication date: 07/03/2016
Field of study

Demographics, in particular, gender, age, and race, are a key predictor of human behavior. Despite the significant effect that demographics plays, most scientific studies using online social media do not consider this factor, mainly due to the lack of such information. In this work, we use state-of-the-art face analysis software to infer gender, age, and race from profile images of 350K Twitter users from New York. For the period from November 1, 2014 to October 31, 2015, we study which hashtags are used by different demographic groups. Though we find considerable overlap for the most popular hashtags, there are also many group-specific hashtags.Comment: This is a preprint of an article appearing at ICWSM 201

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Association for the Advancement of Artificial Intelligence: AAAI Publications

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

Author: Ahn Yong-Yeol
An Jisun
Kwak Haewoon
Publication venue
Publication date: 01/01/2018
Field of study

Because word semantics can substantially change across communities and contexts, capturing domain-specific word semantics is an important challenge. Here, we propose SEMAXIS, a simple yet powerful framework to characterize word semantics using many semantic axes in word- vector spaces beyond sentiment. We demonstrate that SEMAXIS can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SEMAXIS outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.Comment: Accepted in ACL 2018 as a full pape

arXiv.org e-Print Archive

Crossref

A systematic media frame analysis of 1.5 million New York Times articles from 2000 to 2017

Author: AN Jisun
KWAK Haewoon
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2020
Field of study

Defense Advanced Research Projects Agenc

Institutional Knowledge at Singapore Management University

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Author: An Jisun
Huang Fan
Kwak Haewoon
Publication venue
Publication date: 10/02/2023
Field of study

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research

arXiv.org e-Print Archive

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

Author: An Jisun
Kwak Haewoon
Park Kunwoo
Publication venue
Publication date: 19/04/2022
Field of study

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.Comment: Accepted in AAAI ICWSM'2

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Association for the Advancement of Artificial Intelligence: AAAI Publications